OLAP textual aggregation approach using the Google similarity distance

نویسندگان

Mustapha Bouakkaz

Sabine Loudcher

Youcef Ouinten

چکیده

Data warehousing and On-Line Analytical Processing (OLAP) are essential elements to decision support. In the case of textual data, decision support requires new tools, mainly textual aggregation functions, for better and faster high level analysis and decision making. Such tools will provide textual measures to users who wish to analyse documents online. In this paper, we propose a new aggregation function for textual data in an OLAP context based on the K-means method. This approach will highlight aggregates semantically richer than those provided by classical OLAP operators. The distance used in K-means is replaced by the Google similarity distance which takes into account the semantic similarity of keywords for their aggregation. The performance of our approach is analyzed and compared to other methods such as Topkeywords, TOPIC, TuBE and BienCube. The experimental study shows that our approach achieves better performances in terms of recall, precision,F-measure complexity and runtime.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Top_Keyword: An Aggregation Function for Textual Document OLAP

For more than a decade, researches on OLAP and multidimensional databases have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of digital documents, there is a need for incorporating text-rich documents within multidimensional databases as well as an adapted framework for their analysis. This paper presents a new agg...

متن کامل

Olap aggregation function for textual data warehouse

For more than a decade, OLAP and multidimensional analysis have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of semistructured data there is a need for incorporating text-rich document data in a data warehouse and providing adapted multidimensional analysis. This paper presents a new aggregation function for keywo...

متن کامل

Binary-class and Multi-class based Textual Entailment System

The article presents the experiments carried out as part of the participation in Recognizing Inference in TExt (RITE-2) @NTCIR10 for Japanese. RITE-2 has four subtasks Binary-class (BC) subtask for Japanese and Chinese, Multi-class (MC) subtask for Japanese and Chinese, Entrance Exam for Japanese and RITE4QA for Chinese. We have submitted three runs in BC subtask for Japanese (JA) (one run), Ch...

متن کامل

Content aggregation in natural language hypertext summarization of OLAP and Data Mining Discoveries

We present a new approach to paratactic content aggregation in the context of generating hypertext summaries of OLAP and data mining discoveries. Two key properties make this approach innovative and interesting: (1) it encapsulates aggregation inside the sentence planning component, and (2) it relies on a domain independent algorithm working on a data structure that abstracts from lexical and s...

متن کامل

A new last aggregation compromise solution approach based on TOPSIS method with hesitant fuzzy setting to energy policy evaluation

Utilizing renewable energies is identified as one of significant issues for economical and social significance in future human life. Thus, choosing the best renewable energy among renewable energy candidates is more important. To address the issue, multi-criteria group decision making (MCGDM) methods with imprecise information could be employed to solve these problems. The aim of this paper is ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IJBIDM

دوره 11 شماره

صفحات -

تاریخ انتشار 2016

OLAP textual aggregation approach using the Google similarity distance

نویسندگان

چکیده

منابع مشابه

Top_Keyword: An Aggregation Function for Textual Document OLAP

Olap aggregation function for textual data warehouse

Binary-class and Multi-class based Textual Entailment System

Content aggregation in natural language hypertext summarization of OLAP and Data Mining Discoveries

A new last aggregation compromise solution approach based on TOPSIS method with hesitant fuzzy setting to energy policy evaluation

عنوان ژورنال:

اشتراک گذاری